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(54) Method of using i pSmary and secondary piw^ 

(57) The invention relates to Die cornpflafion of 
source code to a primary and a secondary processor ft 
relates to reoonfigurable secondary processors, and is 
espedaly retaont to secondary processors which can 
be reconfigured to some degree during execution of 
coda: Selective extraction of dataflows from toe soiree 
code is Mowed by transtomiafttnofto 
flows into trees. The trees areffwn matched against 
each other to determine ntfnunum.edft cost refafion- 
shjpe tot transformation of one tree into another, where 
inese rnrairun eon oosi reBDonsnfps are otuwmatfia 
.-by fiie architecture of the secondary processo r A (ppup. 
or a rjurafy of grape of dataflows is detenrtned on to 
basis of said mMnxmt edit cost refationshfes and for 
each grotg> a generic dataflow capable of supporting 
each dataflow in that group is created The generic 
dataflow or dataflows is then used to determine tie 
hardware configuration of the secondary processor; 
and cafls to fte secondary processor for said group or 
pUaty of gra^ of dataflows are 
source code. The resutewt source cede is conned to 
the primary processor. 

The resutang efficient corfiguraSon thus reduces 
either fee expensed reconfiguration (m a fieM program- 
mable array), orthesScon area (in anapplcatan spa* 
ciSc integrated cbcuft). 
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Description; 



{0001} The presrt inward relates to 

sistingc* a primary processor arri 
s 6*tfefr rdavant to toe archBeciures ernpioyinQ a leoortfigurabie secondary processor* ■■■■ 

{00021 A primary processa- such as a Pentium 
. Corporator- has ewo^ 

being cptmised tor any of 

operations, Such as pa^e1sti>^^ 
to [0003] An approach taken to sdve tfeprcblem te the development of integrated circuits spexdicaly adapted forpar- 

fcutar applicaSons. These are Known as AStGa. or applcaSon-specffic Integrated doits. Taste for*!** such a ASIC 

tsadapted are generelyperiormed very w^ 

it not coined CtearV. a specie KJ can to 

applications tat are not oenfaaltplhe operation of a cojtjx^^ 
*5 puter. ftisthuspartodertyedvai^ 

catkins as required the cal 

a fine-ga^ed processor^ 

Such structures can be tised a&WependentpiD(«aOT 

use as coprocessors* " 

20. [0004] Such conffgurabte copr oces sors have the p oten tial to fanprove the perfptiwanoe of too pcfcnary processor For; 
particular tasta^ code rui ineffic^^ efficientry to an adapted 

opprocessor which has been opjinteed'tor that qppicattorL Wfth continued development of such "appfcafion-speeffic* 
secondary processors, the p o ss MB y of topvovtog performance by extracting dHBcuft code to a custom coprocessor • 
becomes more attacfiva A particularly mportant example to general computing is toe. e*traction of loop bodies to 

29 flusge handing. 

10005] Toobtantoecfesiredeffkaen^ 

divided between primary and secondary processors, arito configure the secoKfeuyprocB 
ite assigned part of the cocte> One approach is to r^^ 
sorstroctures. to *AC++compSer tor FPGA custom execu^ 
so (EEE Sympos^ on FPGAstr Custom . 
which involves irapptog of 

of the fofifal code by toe program ner. This a ppro a ch relies on the Wfial programmer making a good choice of code to 
extract nmaiy. 

10006] An atternafive approach is to assess toe Inffiaf code to determine which the most appropriate elements to direct * . 
35 to the secondary processor wflb^ Retoer W. Hartenstein. 

JDrgen Becker and Rainer Kress, in Int EEE Symposumon Engineering of Oomputar Based Systems (ECBS), fiie- 

drichshaferx Germany, March 1996, Discusses a oodesign tool which incorporates a proaer to'assesa which parts of a 
; inHial code are suitable tor alccaScrj tea 

feOowed "by a fleiatfae procedure allowing %m oixiylaiton of a subse t of C codetta iscon 
*o -lecture so thai the extracted code can be mapped to toe coprocessor. This approach does expand the usage of sefr 

ondaiy processor but does not tuny realize toe jxatenial of reconfigyiabtelogja 

10007] ComjBraUe approaches have been piopo^ 

approach discussed to "Dat^pathOrion^FPQA Mapping 

presente d at FCCW97, Symposium on Heidf i uy af iii ia ute Custom Confuting Machines, nprff 16-18 1997. Napa. 

* Vafiey, CaHornfe (currently avaUble on the World Wide Web at ht^Avwwicsjberto 
(ey^eo^rojc'C t iJufos J/l^^fccn\DcsterJhwrtxp^ r uses template structures representative of a FPGA architecture to 
assist in the mapping of source code on to FPQA structures. Source code samples are rendered as greeted acyefic 
graphs, or OAGs, and then reduced to trees. These and other basic graph concepts ere set out, tor exanpte, in "High 
Performance Compeers tor PmM Computing Mchaef Wolfe pages 49 to 56, Adfcon-Westey, Redwood C&y, 1996, 

& biitabriefdefiiflksnofaDAG 

[00081 A graph consists of a set of nodes, and a set of edges: each edge is defined by a pair of nodes (end can be 
conskJeredgraphfcafyasa&iejctoir^ 

each, edge has a Direction. If it possirfe to define a path wfthin a graph from one node back to Hseff, then the graph is 
eyefic rj not then the graph is acydc. A DAG Is a giaph that is both directed and acyefic: ft is thus a hierarchical struc- 
s tura AtreefeaspecirckirdrtD/^Atreeteatto^ source nod$ termed "root", and there is a unique path from 
root to every other node to the tree. Htoere is ot 
termed the chad erf X> to a tree, a 

em, whereas to a general D^achfld can hare more than oro parent 
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{0009} hJhewcttrjfTimCa!l^&Jc^ 

is a aeneraHyavaiable software tool, and its^pplicafion bctesqfcedffTA 

Pataget^C Compter Desip*^ 
s irings PUfehng Ca, *kl, Rod^CitY, 1995, especfcay at pp 373^7. Ifauig tates as ^ me sourc e code tees 
and patKcns this inpii into chur^ 
tree owm approach Is ess^aly deleft 

cornjtac ft JrwoM* a btftonvup matching of a tree>lh patterns. recoKfing al posstte matches, followed t^r a top- , 
reduction pass to o^ 

*b dgrifcant W consftairt bl the ^ Wcrflhe predefined sal c* afcw^ psUeras, and.cfoes nrt W 

tittteofaTBConfigiiable archftecftra ...» / 
I0W0J Thereisthusaneedtode^techhiq^ 
. temsifMMv apifm»yand secondary processor, ty ti^ 

: secondary processor, which con then toe configured as efficiently ss possijlB to run the extracted code, with a view to, 
ts maanting the pertorrnance efficiency of thejpnmary and secondaryprocessor syst em In ex aaitorioj ftyutoccfe. 
rpoi1] AcconAip&theinYerittonp^ 

ay comprising: setecfive griracfai of dataflows from the source code^ranstormafion of the extracted dataflows into 
trees; mafchkigd the trees against each^ 

tree bto aromerxtetemirabg a group or a phrafty of giOMps of dataflows on the basis of said mirimum ecfit cost rda- 
20 ft^^^ create t* each gr^ 

^source code cate to ^ secondary proc«^ 

resuftartsourreccdetotheprin^ 

[OOig Trfe approach aBows far opti^ 

25 wflhcuipreWP*^ 

account of tw demands and requrements of tieseoondary processor ardftectum Ackarta^ousfy, said rnirnrnum ecft 

costreiatoshtea^ 

cosldacotfespond^rec^ 

eofccostfeteficiwn^roen*^ 
so [0013] The tnethodfocfete most ig^ap^ 

alows fat recortBguiafion of the eecwidaiy processor*^ 

ratton dtte secondary processor to be r^ 

c^si^iportedby a generic dat^^ 

essoi; and tteprcosssor hardware^ 
: *s. (sierras iw shown in fte CHESS arct^ 

[pot 4] . AoVan^eousiy;tie genenodala^ 

group on to each other, ttk^ . 
[0015] An atfvantop^ciB approach to cori^^ 
clcrfg«jr« and reduce them to trees by rem^ 
40 . pafttwhwenateafiiodeaiidthBiirt 
whttpe^ through the tero^ 
IfinD^^JMupiatetolheseo 

sensavetofrefrra*rf operations*^ * v 

[0016] Mackarnageous fathers^ 
45 flew fecortparodv^ further dateh^ 

match suffidentry dosely the ge^ 
. preserttotoeojcecofew^ . a / 

[0017] h 0>e approaches Wfcated abovB, the removed Enks are stored after tt*e drected acyefcal graphs are 

reducedtotriK*andarere^ 
so datdflowL 

[0018] Spectofrntoodcnentsd^ 

par^drawn^d which: 

Figure 1 shows a g^nenal purpose ayrputtfarcn^^ 
ss appfed; 

Rgund2 shows schemaficaly arntir^ 

i^toanerrtodknertdnelivatio^ 

RgveS iustrates a stop tfwwrcrsfenofaDAGto^ 
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msntol the irwenSon; 

! Rgure 4a lustrates thestep of insertion and deletion of nodes and Figure fflustrates foe 6tep of atetitulion of 
nodesinafoee matcringproce^ 
F^e 5 shows an edft defence tae^ 
s Rflire6flbBtrateaflerieficdrfaf1w 

Figure 7 shows a Kogfeaf interface for allocation of secondary processor resources for a generic dataflow accorrfng 
to^en*oejmertoftr*toven^ . 

Rgure 8 shows the appfca fio n of DAGs to dataflows iricluci^i>iuM|pteterslo ttandtecoiKStonai statements; and 
Figures 9a Id 9d shew an ilustra&m 
w BCConSngtoaneiiKXftnert V 

J0019] The present jrrrentbn is adapted for cornplatfon of source code to an architecture comprising a primary <&id 

a secondary processor* An exan^e of such an architecture ^ 

ikanafgsneialinipo^ 

is mpry processor 1 and returning responses to ft are secondary processor 2 and (optfonaBy) 4. Each secondary 
prooessor 8,4 te adapted to tn^^ 

source code not weB handled by the primary processor 1/ Secondary processor 4 f optionaly present here, is a dedi- 
cated coprocessor adapted to fwriole a specfe 

ester 4 wi be determined by a manufacturer to handle a speclic frequently used functoi. Such coprocessor 4 are- 
so not the specific st**ec* of fo^ 
a tpecfficfiiK6on,bUk instead 

by the primary processor. The secondary processor 2 is adwantageoi^ an app 

con ven t i o na l FPQA, such as the XHrtx 4013 or any other member of the XSnx 4000 series, Art aftematfve class of . 

recorfigur^de^ referred to a^ 
zs secorriaryproce^fOTbecorflgu^ 
. for an app&catioin to be executed by the architecture* 

[00201 Ateo ernptayed in the coriputer architecaure are memory 3. accessed by the primary processor 1 arid tor 

appropriate type^of secondary processor 2. by the secondary processor 2, and Input/output channel 5. hyuVotriput 

channel 5 here represents afl further channels ano^ 
» ^brexanpla by programming) and 

[0021] The present favenfai Is parfaibrty relevant to the optimised paditoting of source code between primary 

processor 1 and secondary processor 2, which alow lor optimal uu 

the handing d toe appfcaik* 

.. the fcwertiion the selection and extraction of oodfeforu^fnlhe&econdary processor. 
95 [0022] The approach takeaacconfiig to an entttfimertofteirwert 
process is a bo^ of source coda In prirx^ 

, C codec but the person sidled In the art wBI readily understand how the techniques described could be adopted wflh 

other languages. For example, the spurce code ootid be Jara byte co^ 

artfidecAjre of Rgure 1cou^ 
40 If© internet ^ 

[0023] As can be seen from Figure 2, the first step in the pto^ess is the identification of q 

to be exeoited by the secondary processor 2.1^^ 

and bufljng ap pro priate rapresenta^one 
. is nonrafr preceded byairaurnialp 
4S cation to secondary processors is discussed in, for exarnpte, Alhanas e! al "An Adaptive Hardware Machine Architec- 

tore and Compler for Dynamic Processor RecorfigurattonYIEEE International Conference on Computer Design,' 1991, 

pages 397*400. 

[0024] The approach taken here fetobu 
codk An advantage** way to cto this ^ 
59 of dataflows: an appropriate compiler infrastructure is SU1E da/etoped by the University of Stanford and documented 
erieretaiyrtlheVfon^ 

' -for tw/vperforrnance systems; spedficaly inducfing systems oornprising more than one processor. A standard SUJF 
uflHy can be used to convert C code to SUE It is then a sirnple process for or»sfc»^ 
bufldDAGte by perform^ a dataflow analy^ 
» [0025] The extraction of DAGs from source code is a conventional step. The n^dst^ in Reprocess, as can be seen 
from figure 2, is the cocversOT 

of code for execu^ by the seconoWy processor 2. DAGBarccorr^structures>ando^ 

manner. Reduction of DAGs to trees alows the aspects of tie dataflows mosl important in o^errrining tier mappiro 
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tohardwaretoberett^^ 
tcantiy more effective. 
10Q26J Discus^ to reducfioh of 
. (as ritedabwe). espetiatty at pages 56to60. Diteert termi n obgy Is used here from that used in the tiled reference, 
5 butecpvatertandGoiTp^^ 

H0027J The preferred approach Mowed in the f«iuctiono<Q^ to trees fetter 
• between leaf nodes and the root this is Busirated in Rg^ra 3. The criticaJ path between nodes A and B fe in a first / 

eirix>rtnem of tNs reduce ■ 
» : deffcBtoa acydfc dte^ 

between nodes ftathave the same marine nuntor of mxfes. but th^ - 
f^jrpoeeoftreecon^aicfioa wrtfemaHng an arbitrary 

In m&kK the soiree code successfiity is scheduling, whfch depends on timing Mcwmation: accordtogfy where ft is 
necessary to mate a choice between aftematt/e "critical paths" ! is desirabte to choose the one that wouW tate the, 
ts longest time fa temg of time tatento exBCUte each of tie openfions represented by fee nodes in thejsalh). Asiscfis- 
.cussed farther below, atternafive approaches.carf be adopted wWch, are based more dfrecfy on fining i nfor m afi qa ft is 
also derirabte to adopt* consist 

resiftfromesse^rysimto * ; . ? 

BKK81 The process taken in ^plying this ^entodmert • 
» leaf nod* e^pa^epaft 

fndfeated above, lor each IW node to pamwah to greatest 

ha» to same ruitor of nodes. *^e^ 

edocfed am out ir) their edge closest to th6 sta^^ 

^crosMnk^ to Wbtfe reference).^ 
2s niriorBnksaieGtoiedsepa^ 

essor 2. but are not used « deterrmmg whk* eourcacade is to be mapped to the secondary processoc 

[0O29J Itfe olwuseposs^ to constriri 

pamdoespeo^epartfcu^^ ^ .. 

wffl haw Reflect on schedufrifcwhfretti 
so erabelrftienceonlanrmgajrt 

represents as best possflbtetfaxri^feet^ 

10034 Rgure 3 shows the appfica^ of the process desc^^ 

shows ftreeiries under cens^ ^ 

represented as a directed acyefical jjapfcwim root 126 (variable e) -and leaf nodes 121, 129 and 130 as the inputs. 
as [0031] tfenowasfcaifltttawa^ 

number of nodes in each path. Prom node 129 Ortegervdue 2), tore is only one patK through nodes 122, 123, 124 

and 12& This is ton to critical path from 1^ node 129 to root node 126, and wil be*present tithe free. Rom node 

121 ftito present case to rest* of an eariercp^^ 
. mrough nodes 122,1^ 124 and l2S.¥tfiereasthe&eoo^ - f 

4* is to critical path, as ft passes trt^^ 

remafctog leaf hode 130 partible b) also has two pato awttble: era pess» 

whmas to olher passes ttiou^ rn^ 
. elmerpam can be. chosen as the cto 

sfctency) ft is desirable to operate under an ap prop riate set of farther rules tofnake the best setecfav Such further' 
45 rUes may, lor example^ be determined on th^ 

[00321 The nerfiaep to take is to consmxt a tree Htom the oftk^pato chosen tan to 0^12. The is dow^ ■ 

cutfingallnoncrifical pate \nttek edge closest to to startngpemt (that is. to 

fe not ateo part crfa criticaJ paft). The first non^calpam to consider is that from node 1^^ 

127, 128and 125. This can be cut on to edge between iKries12iaiid 127 -m 
59 of ecfre 151 between nodes ut (c cn esp on dw g to 121) and 147 (correspond^ to 127) which te 

amiiiorlnk-Theoth^ixrKaificalpamto nodes 123, 124 and 125: 

ihiscanbe cut on to edge between nodes 130 end 123, Agala this cut edge is stored as a rrinor fink 

[0033] ft should be noted that condflionate can be represented in DAQs and so reduced to trees in exactly to same 

way as sirr^ equations. An exampte 
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d = b •• 

eke '.• ' 

• and shows a muftpfexer node 188 and a less than* operation node 186 b aotfton to the variable and Integer mies 

1.81, 182; 183 ari l 84. As the stifled man w8 appreciate, it w3 generaHy be possile to use the approach sham here 
"-; forsoun» code whkftcan be represented . 

[0034] The tre^stiucto that is left ^ 
is source code shaukJ be mapp^ 

sJbcwisaparfiGtfarty.B^^ 

^yfcaft nandftrrt^i^^ 

Wfeed (assuming each node representee singlo con^putafionaf element) because of toe Inclusion of peflwuvifo foe 
rrexJmum number of nodes. As to person doled in the artwl appreciate, atematrve approaches to detenrfning which 
» edges aretoberemovednconv^^ 
. free reduction process is to assign a tirffing^ased weight to every node (based, tor axsurpJe, on the lengfl> of time 
. required to execute 

path, selecting a path to define the tree accoro^gry on the basis of, for example, greatest accuniulated weight This 
. ajproa^maybeiraeapp 
25 torardinp^rtk^tftoe tirn^ 

tiresv^e, for example, rrirfep^ 

{0035] The nert step m the cziTTpGatoip 

the select of soura cxxie lor the seoon^ 
• comprises a series of sU>-step^ 
st> tfdatadataflca^TOBisa sigtf^ 

[0036] The objective In this stage of the convocation process Is to determine as best possiie which cf the candidate 

delation from Ae source co^ 
' degree dependant on the nature of the ha^^ 

code to tie secondary processor 2 can be made where dataflow are 
35 - ware representation can be used tor each dataflow, ft therefore folows f^gc<xi choices of cariddate dataflows for 

irapptogtothe second^ 

THsteirtialteadifeve^ 

[0037] A powerful techrique tor in^^ 

riftmdMsedtyKaizhc^ . 
«> P03« malgontvntsotesai^ 
" Wi /Uprfftrrica (199Q 15205-222, Springer Veriag. antfis provided as a tooMt by tte Urirversay^West Ontario, 

trtttoc^ being altrml^ctfwriitogo 

ft wa be appreciate thai attend 

BYBTabtetomeetfledrrHaTrttap 
4S 100391 The prirw^rf operation of 2r^ 

adynajrcprograr^ 

cost of tr an sfoma ti ort rs termed here an edft cost The etft costs cf successor/ larger subtrees are crosfroornpared, 
wrft a reoord being kept of the rraramum oosts found. The CGnputafionaJ structure can be cteracterised as. that of a 
recursive dy nam i c progra m whteh uses a worMng dynamic pi oyjanwing grid to calculate component subtree efctanceg 
so ardrecorcfe the resutt on the main grid 
[0040] The etB operate* ava3attea^ 
figure 4a sro^t*o trees: tree 1 51 wWi^ 

Identical by addffion of a node between rwtesSandSof ^151:thferiewnocteg^ 
sequenfly tr an sforma ti on of tree 151 to free 152 is achieved byirtserfionof this node, and t rarelonnafon of tree 152 to 
55 tree 151 is acrieved by deletion of l (in to 
radiwby Ijpass-oraurti^ 
extreii^ low cosQ. Figum 
a oTfferert type of operas to each tree: ftte 
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the other/Every 

various types of rode posdie. 

[D041J Asprevkxjsly inctoedLea^ 

for example, toe.ssrne resUt may be achieved in some arcWteclures eftherty an insertion and a detebpntir by asub- 
smrfion: the costs of toese Cerent aftemairves can be compared 

10O42J The result of the «wpanson of tvp trees by this aigotfhm is the production of a 1st of pairs of nodes fat?), 
where tl belongs to the first tree and t2 belongs to the second tea Each pairing constitutes an identification of similar 
points h the two trees, suggesfrq toe mapping of fl iand t2 on to each other. The list of p^rs effectively defines toe 
skeleton of a tee which can contain eilher of the compared trees: in this skeleton; to transform the firet tree into the 
second tree, each nodetl has tobositodi^ w* the respective t2. Nodes that do not occur in the mapping must be * 
eaher irsefted^r deleted depending on which tree mey belong to, as ts discussed further Mow. For this fist c# pate 
thereiflbedefriedaned&<fislance:!lfe^ 

one tree to the other. The aigorltom is devfeed to determine an ecm distance between two trees, togefherwfih the set cf 
trarefcrmattons wWch achieves tat edE distance: aBemafhre t rare tormafions wi be possfole* but they wffl have a 
.higher associated cum^ 

0)043] The vakie of computingffii edfttistence based on edrt costs is that toe ecfi costs may be chosen to represent 
the Trodware ccsT to reconfiguring'the secondary processor frpm toe (tonfiguraficn represenfing one tree to a confir 
urafai representing tie other treeln a mapping. This T*idwarB cost'istypicaly a measure of the quantity c* second 
arypocessor resources toat wil ^ 

can be cor^dered,1or acam^ in terms of ^ adcfiBonatareaof<tevice usedL These costs wil bedeterirwied by toe 
retire of the secondary processor hardware, as for different types of hardware toe prvekxd reaGsalton d inserfoV 
d^etoarticiiJGftufionopera^wa 
Twpass'cperataiiwor^ 

tions) has tow cost whe^ . 
[00441 Asirtlcatiriftocmaned&dsten^ 
taken: using Zhang* aigc^ 
eachcneofas^ 61 frees. This tamwmy can 

In BguB 5. Each leaf node 161 of the tree represent a candidate tree extracted from a DAG, and each intermediate 
node 162 represents an eflwst The 
between two leaf nodes of a prife 

ax^a. the edfttfstarra between aw 
edftclstarK» between Tree#i and Ti^ 
and<L 

(0045] ThistowrwrrwtelndU^tf 

onwrylsavalwtoletootas&canbei*^ 

The creation cl a taxonony to render 

tog^ (as wil be dsoissed below), and which are too d^ 

edadtetancettweshcUAgroupoftreescanbeseiec^ 

posstfe pair of tees in the group is less than the ecfi distance tfFesriokL The vafcje-of the edrt distance threshoto b 

artftrar*andcanbechos^ 

in oiderto optimise t>e performance Of fhe system. 

[0046] TheatantageofOTiBoDda^ 

whole ffoqj and wil support the toncfan of each tree. This is particularly appropriate tpr architectures, such as 

CHESS, to which Icw-tetenc^ partial re^^ Reconfig- 

uratfonisrequredtocter^ 

ofarwtoertree:r«wrevei;astoe«dft^^ 

degree of lecofiguration requi^ 

dated togefrer by construct 

been corsAiK^ toe superfr^ 

source coda by retosertaitf to 

from toe ftfl supertree. The construction of the supertro 

(0fW7] Hgireetlusiralesthestep of construct^ of a ajpffltiB8 torn a grotp oftreeswttchfaflteto* the specified 
edft cost threstwld: such a grotp oT trees is here termed a ctess. The trees 171, 172 and 173 can al be mapped 
togetoer into siyerfree 170, The reoorffl^ 
exair^a tree171 tothat of tee 172issuffic3er^ 
the two trees Is below the edt cost threshold. 
[0048] AnexenplarysuperfreeassemUya 

algorithm is described beta* with reference to Figure 9. The algorithm contains the Wlowing demerts: 
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merge: ■ 

fl)0«J Ihet^ 

^aaequal number of mries> an 
5 100501 taeachsaratrro 

Rom the mapping between tie source tree and foe merge tree which has been calculated (W this entailment 
from Zhantfs aJgorthm 
constructed as folkw&t 

1. -firstly, mapped nodes closest to tte root are considered; 

2. The source tree operation (source operation) is concatenated to the coresporetihg; mapped mage tree 
openxtion (m&ge operation); 

3. For each child operation of fie source operation - 

a. If fie chid is mapped *wert to step 2 with respect to the source child 
tttf toe cttkife not mafped, then con^ 
is the rocrt (source subtree). 

so 1 if fiere Es no further roapptng,'6lmply adopt the source 6tttreefor meirgifig Mo the metge tree under 

• the corresponding merge tree node. 

tf there *s a further mapping inside fie eource subtree, connect the subtree as folows: 

a. If the merge operefon of this subordinate mapping fafe outside fie previously mapped subtree. 
sb remove fie mapped source operation from fie source tree. There' te recursion present at fris 

stage** where mopped children have already been dealt wBh» fil fiat n eeds to be. done is to 
remove what wwAJ otherwise be a crocs tree fink. 

b. This Is shown in Figures. If the merge operation of this subordinate ijuppfngdoesfal) wfwi 
fie previously mapped subtree, cGmb tp fie merge tree irti the toast common ancestor for ail . 

30 yj dollied subordinate mappings is found. The- least cortm&n ancestor fe the ftst node to contain 

aU of the source mappings. The unmapped source segment is tben mapped into the metgetree 
by frfdrig me source operation of the unmapped source subtree as a c^ 
ancestors parent, end by fnlti^th&loGwComnxn ancestor as the of trie unmapped soince 
operation just abcve the dosest mapped source operation in the current subtree (where the *ctos- 

ss est iru^ved source operation* detmfts the knver end of an u mapped segment of fie some© tree, 

ana e.e mappeq rtcoe wrocn jqms mnn me suoree pi me currm mapping • me source nooeB 
• parent, which is un m app ed artnpts fia merge freefe toast ownmuo ancestor as a chlfct and vice 
versa)* 

*9 + The pair of Intermingled trees are normaised Into a single tree* which farms the now merge tree. 

The procedure continues until a& the source trees to toe class are coiUafcveditfThin the merge tree, which is new a 
supertrea 

(00611 fbis process is indicated in Figure 9. Figure 9a show hrodateftow trees, a mer^ 
49 20i There are three nrapptogs made between rwte 

to be inserted appropriately* As incicated in section 1 above, the first step fetocor^erfie mapped operatic 

fie mot- in tote case, at me root These operator* A are concatenated 

(D06Q Staffs, the chid nodes of A to jrmscxjrce tree are considered Note ^ 

an ancestor to any mappings - i! is tierefore merged as a chid of A* (see Figure 9b). The other chQd node of A. C. 
eo does hbwrar have descendant mappings (D and FwWchma^ 
operations fafl in fie previously map^ 

the course set out In eecfon SfaX^Mb) abova The least common ancestor coniatoing both mapped merge operafions 
D ard EbX. Cot the eoui» free Is f*B Inked into them 

Trfearr&^aiwrtissriowriin Rgure9b-tnQ merging is completed by concatenation or rnergiig of the remaining nodes 
97 of the source tree, an of which steps are strrightfonrod. 
10053] Theresuftantsupertree2Q3isshoiwm 

rfatantercandfclafte source tree 204 ft 
supertree nooe * merging is thus ensrer/ stratgnsorttorci, ana consists ony Of concaienaiion (ie s upst nutio nj. xno 
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process continues unp a) the carxfic^ trees mBmefped into a sipertr^ 
P05fl AtthfestegeLftfepoBs^totate 

aryprocessor. The source code will oontam DAQb other than tiwse which have been selected for irxdUsfonof the siper- 
tree: for example. DAGs which have not been considered because they do not lie at one of the most OHqputaBonaly 

5 intensive Twtqpots" erf tie c^ 
ately adapted secondary p«x»s^ 

temajrwg DAQb with the supertree by a bacfcmappta process: Processes derived from convett ft nal bacfcmapptag 
tecftuques,suctid&lbuig l <^ 
\ to used Zhang^aJgoriihiue^ 
to wifo a tower edit cost ftre^^ 
■ tree/orwhereewe(fflcostfw 

be alocated id the secondary processor and the supertree mocfted, if necessary Contort infor mat ion related to any 
such dataflows added by this backmapping process needs to be' stored atea 

fpOSSl Rom tfife supertree, ft is ten s&aghfai wenllo hsertthe minor Bnte whkftwere removed from the QftSs on 
i5 their conversion into trees pncfaxfng here any DAQs added from the backmapping process, V emptoycpQ. The reoijfrig 
. stitictuefeattassdatefl^ 
.terthesLpsrtree(forexaiTp^^^ 
• fkwcanbeusedlcrtepuoposeo 
be used to prcvids a stnxfcira^ . 
'20. esson these £tep& are described further below** 
[D056] Stitching*^ tote 

not the dasc dataflow, as the sif>ertre* prescribes th e periphery of the datafl owl The acfiora required wifr respect to 
any Traced dataflow in the souiceoode are 

dataflow) with load primffives and of the cutptf of the dataflow froot of the relevant tree) wHh a read The leaves end 
2S roots of ttereiey^ free are oxtail 

code subsumed in the dataflow can simply be removed, as ft is replaced by the secondary processor conGojiKaiion. 
IQ057] t^e7shawsa/bgSea!inter^ 
labelled Input Tree *3. Is showa 

unique operation ID obtained from the compiler HemaJ form representafiort For the stpertree (PFU Tree), refiWaiB or 
*> other 10 resources are aflocaiedto the leavw aid the root The irrp^ 
tta provides a correspondence bef-^^ 
IreetnthelonnofaspeqHi^tirjaTheap 

remowai decode subsumed by a^ ' : 

[0058] From the class dataflow, it te poss»e to configue the secondary processor. This step can be conducted 

6 according to feicwn approaches, by reduction of tadassdaiaflw^ 

a tone, end jndudmg in appropriate form any dynamic reconfiguration instructions), and then mapping the netfist to the 
specfesecxxidary processor har*w 

flows, fo.corwertional FPGA arcfttectores. these steps can be carried out essertialy by use of appropriate Known 
: tool& Fxe«mple.mteca^ofastand8^ 
49, can.teusectFu^y, the can be rendered hXEta 
fto configurable to^^^ 

resuhart bong converted lo a oon fl g uoan bftstrearn by the XMinx MakeBfts proyam Thte approach fcttaussed, 
together wfth further cfscusston of provision of predetermined ra c o n ft u ratfan sduficro, h p Rur»T&ne P rogramm i n g 
Method lor Reconfigure ConrputeT by 8teve Cassefmaa currently avadabto on Me World Wide Web a* 
« httpiflwmwe^ 

ble on recof^raWeoorrpufing operas 
EEseri^sWJar procedures 

as the CHESS device deserted h AppendbcA* using tocfeappn*^ 

[005flJ me source cede regenerated in exec^^ 
» ^nce the secondary processor conffoura^ 

source cods beaacuted in the primary 

ondary processor tespecHicaJ^ adapt 

nTcantlyinOTasecLFbrexanple,a2S^ 

irven&mtotheiDCTaJgor^ 
S5 secondary processor because of I/O oonstrahst 

10060] The meftods here described are tiiBpartarfartyeffecfveto 

an an archftacture comprising a primEffy processor and a reconfigurat^ 
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APPENDKA 
CHESS wrcy 

The CHESS array is a variety of fietl programmable array in whicjU tbc programmable 
^mtsas are dot"g«tes. as in anFPGA. 1n|4tfariflipMttc kj^ note (ALUs). The array 
canfignraioo a described m detafl in Enrcpeaa Patent AppUcatfan No. 97300563.0, and the 
ALU stroctore and provision of insontiion to ALUs is discussed in a copending application 
entitled *ReConfignraWe Processor Devices* and filed on the same 1 dale as die present 

TihcCHKS anay consists of a chessboard layout with alternating sqpu^coiDpdsiqg an AL U 

and a switthbdx stractnre respectively. Tberonligiiratkffli^ 

BheJdintheALU. bdrvitaal ALUs maybeused inaproct^^ 

knpIenientatiDn, provision is made to allow dynamic provision of instrocikms from one ALU 

to determine the function <rf a snccffrtmg ALU, ALUs are 4-bit, withftnr identical bitslices. 

with 4-Wt inputs A and B taken directly from an extaisive 44A 

and 4-Wt output U pzwided to tbc wiring network throogban, optionally latchaMe oeipui 
register: 1-bit cany input and output are also provided and have their own fattercannecc 

• Dynamic instructions arc providablc from the ouipat U of one ALU to a 4-bit TTrtfnrctroii inpnt 
IofaiwtoALU. Theranyoi^C^rfpneALUcanfto 
with fre effect of changing the ins truct ion of that ALU- . 

The CHESS ALU is adapted to support imtltipltXTDg between A and B ingots, and also 
supports "nmhqdexing between related instnictfans (eg OR/NOR, AND/NAND), 
tteoonfignratko between such instxortums can be achieved throngh appropriate tise of the 
cany inputs and outputs without OTKUn^jponofaHcon. More eompta r cconfigpr alioig (eg 
AND/XOKAdd/Snb)canbe 

the two alternative instructions and the second to execute the chosen instmcfion on the 
operands. Multiplication will take up more than a single ALU, making fwxmfigiiratioD 



to 
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involving amaliipBcato ft is straightfo^ ftenndtipkrar 

capacity of a CHESS ALU to "bypass* an operatkrti, with appropriate con^ in 
cither perfonnaireof ^ 

■ * ■ ■ • * *■ ."-**".' 

A sample set of fbiirtions obtaioaMe from the mstrtotkw inputs is indicated ia Table Al 

ixtow: a wide range of possflnlitka air available impropriate logfc in coDnecti pn of die 

instruct 



h 
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ii 


To 


Carry In value 


0 •.. 




D : 


0 


0 


0 


. XOR '■=. 


NXOR 


0 




0 


1 


A AND B 


A ORB 


P= 


0 


1. 


0 


AANDB 


A ORB 


6 


O 


1 


• I 


- ADD 


p 


1 


0 


0.:. 


A OR B. 


AANDB 


o 


\1- 


0 


V 




A ' 


q 


i 


1 


p. 


.'".A 


.: B 


0 


l 


1 . 


t 


MATCHO 




0 


0 




ANANDB 


'ANORB*. 


1 


6 


0 


"L 


NOT A. 


NOTB 


•.it; 


0 


1 


0 


NOTB. 


NOT A 


i ; 


0 


1 
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MtATCHl 
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i 


0 


0 






. i 
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0 


1 
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A EQUALS B . 
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i 


1 


SOB 



TaMe Al: Instruction bits and corresponding functions 
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• 


Name 


U function ■ 


function 




ADD 


AptnsB 


.. Arithmetic cany 




SUBA 


. A minus B 


. . Arithmetic carry 




A AND B 


Uj *= Ai 'ANDBi 




to . 


A ORB 


U| = A|dRBi 






ANORB 


Uj = NOT(AiORB^ 




15 


AXORB : 


Ui AiXORB: 


P. as fV 
^^ort Hi 


' ANXORB 


: XL =s NOTfAxXORBiV • 






A AND R 


\j\ finis \r%\j i By 


P B P 


M 
0 


' * R AMD I 


TJ» = (KfiT A.Y AND R. " 


■ " P = P • * • 
Vua Hp • 






If.— /HOT A -A OR PL 


Hrt * Ha 




15 OR A .' 




' P — p 
Hwt — Hd 


25 


'■■ A " * 


ft — A- 


* * p — p 
H««- Vn 




B 








Hot A 


Xli = N0TA| 






NOT B 


Ui = N0TB| 






A EQUALS B 


Not applicable 


ifA==BftcaO, dsel 




MATCH 1 


Not applicable . 


- bitwise AND of A and B, 
follo wed by OR across 
widfliofthcwoid 




MATCHO ' 


Not^HcaHe 


bitwise OR of A and B, 
followed by an AND across 
the width of the word 



Table A2: Outputs for instructions 



2s complement aritfametic is used, and the arithmetic carry is provided to be consistent with . 
tbtsaridtaetkx Ibe MATCH functions an: so^ed because for MATCffi 
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1. A method tf conp^ 

sefecfre extraction of dataflows from 1hs source code; 

t r^sfamafonofthe exhaded date^.intotrees; . ^ 
maid^t*thefreesagai« . 
tree into another; . ■ • • ' • ' 

determine a^rm^ 

^creating for each gray a generic dataflow ; 

hauling Into the soiweoode calls tothe eecondary processor lor said groiy or plutfty of goiys of data: 
ta^andcaritfng.tte 

* Ametoddasdai^ 

town cdttcSstances jbrdasdnc^c^^ trees: 

* Ame^aedaimed^d^lordalriift 

to. the arcWeduna of the seoomt aiy proc essor ; and repreecrit a hardware coctof a correspcfrxing racon^jpwfion 
eftaeeconJaw . <• •/ 

4. Amethodasdjfcnedfoaydd^itoa,^ 
torreconfigiialionofltesecon^^ 

5k Ama^asd^dwda^ 

6w Amethodastlairaedlnd^^ . 

7. AmefKdasdajmedtadata4,wh^ 

a AfiiortDdaadahwdiiiaiyofda 
executoof ^source code toawon 

a Atnethodas dakned iriary pecedfang daM wherein a ganertedate^ofa graftecalalaWty a^appredbata 
majjrirxjtf dmaftateinttefr 

10. A matfiodaBClairiWh any precede 
areredU^totreesbyremo^ 
a teat node and the root of a dmdWa^M.g^:. 

11i Amallxxlasdalmedlnd^ 

terpaarnumber of brtermedtete nodes. jj* 

12. Amethodasdaknedmc^ 



IS. Araefrbdascfaimedta 

fecoriiparedwilhfi*thente^ 
*r^n those of said *irtherdal^^ 



14. A method as dasmed in any of daims 10 or cteim 18 where dependent on ciaim 9» wherein fhe remwed Inks are 
stored aft* ff»<frectri 
meigingofttietieesofte^ 
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Figure; 5 
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Figure: 6a - 
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Figure: 7 



27 



EP 0926 594 A1 




Rg.8 



r 



28 




Figure: 9b Merged Composition . Figures 3c Hew Supertree 




Figure: 9d Next Candidate Mapping 
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